$See \ discussions, stats, and author \ profiles \ for \ this \ publication \ at: \ https://www.researchgate.net/publication/310124300$ 

# Impact of Spintronics Transducers on the Performance of Spin Wave Logic Circuit

#### Conference Paper · August 2016

DOI: 10.1109/NANO.2016.7751399

| CITATIONS<br>5 | 5                                                                                               | READS<br>128 |                                                                                                            |
|----------------|-------------------------------------------------------------------------------------------------|--------------|------------------------------------------------------------------------------------------------------------|
| 7 autho        | rs, including:                                                                                  |              |                                                                                                            |
|                | Sourav Dutta<br>Georgia Institute of Technology<br>34 PUBLICATIONS 165 CITATIONS<br>SEE PROFILE |              | Rouhollah Mousavi Iraei<br>Georgia Institute of Technology<br>13 PUBLICATIONS 103 CITATIONS<br>SEE PROFILE |
|                | Chenyun Pan<br>Georgia Institute of Technology<br>47 PUBLICATIONS 215 CITATIONS<br>SEE PROFILE  | 6            | Dmitri Evgenievich Nikonov<br>Intel<br>255 PUBLICATIONS 6,661 CITATIONS<br>SEE PROFILE                     |

Some of the authors of this publication are also working on these related projects:



Project

MESO logic View project

Performance Modeling for Post-CMOS Spintronic Interconnects View project

## Impact of Spintronics Transducers on the Performance of Spin Wave Logic Circuit

Sourav Dutta<sup>1</sup>, Rouhollah Mousavi Iraei<sup>1</sup>, Chenyun Pan<sup>1</sup>, Dmitri E. Nikonov<sup>2</sup>, Sasikanth Manipatruni<sup>2</sup>, Ian A. Young<sup>2</sup> and Azad Naeemi<sup>1</sup>

*Abstract*—We propose a comprehensive scheme for building spintronics transducers for back and forth signal conversion between spin and charge domains in addition to the associated CMOS peripheral circuitry for spin wave device (SWD) circuits. We perform systematic analysis of the impact of the transducers on the performance of SWD in terms of energy and area overhead. Performance evaluation of SWD compared with 7nm FinFET CMOS technology at the circuit level, yields a 3x smaller area and an order of magnitude lower energy-delay product (EDP) for large complex circuits with low overhead. Overall, SWD outperforms CMOS with increasing ratio of (logic size/overhead).

## I. INTRODUCTION

The scaling of the CMOS integrated circuits is expected to meet its apparent end due to fundamental physical limitations. This has led to an active search for alternative state variables and new hybrid systems that can incorporate CMOS and beyond-CMOS technologies to improve system performance. Spin-based devices have been among the forerunners in this search due to their inherent nonvolatility, higher logical efficiency and significant departure from von-Neumann architecture by enabling memory-inlogic and logic-in-memory architectures [1].

The possibility of utilizing spin waves along with the magnetoelectric (ME) effect for information transmission and computation has been sought for owing to low energy dissipation comparable or lower than charge based devices [1], [2], [3]. Recently, an extensive design and benchmarking methodology have been presented for spin wave devices (SWD) in [4]. However, the underlying operating principle is based on [2] which did not fully address some of the essential requirements of logic applications like concatenability and non-reciprocity. Also, while a micromagnetic approach has been adopted, the voltage-induced change in anisotropy as input and voltage-output modeling needs to be better elucidated. Finally, while all the previous works have been dedicated to designing an energy efficient non-volatile SWD with innovative logic synthesis methodology involving Majority-Inverter Graph (MIG) [5] that fully exploit the SWD functionality, efficient and scalable mechanisms for information transduction between spin and charge domain remains unexplored.

In this work, we present a comprehensive scheme for building efficient spintronics transducers for SWD including the associated CMOS driving and interfacing circuits. We provide a thorough analysis of SWD logic circuit by combining our previously proposed design for non-volatile clocked SWD [3] and the transducers presented herein. We perform micromagnetic and HSPICE simulations to estimate the energy and area overhead introduced by input/output transducers and evaluate the performance of SWD compared with 7nm FinFET CMOS technology at the circuit level in terms of the area and energy-delay product (EDP) metric.

## II. CIRCUIT ARCHITECTURE

Fig. 1 shows a general pipeline architecture of SWD logic circuit consisting of a charge-to-spin (CS) converter block that translates information from charge to spin domain, the intermediate ME cells that act as computational blocks, a spin-to-charge (SC) converter block that transduce information from spin state to charge variable and a 3-phase clock that ensures sequential transmission of information and non-reciprocity.



Fig. 1. Illustration of a SWD logic circuit architecture with CMOS peripheral circuitry.

#### **III. BUILDING BLOCKS**

#### A. Magnetoelectric cells

The basic computational blocks in a SWD logic circuit are the ME cells consisting of a ferroelectric or piezoelectric (FE) layer sandwiched between two metallic electrodes and a ferromagnetic (FM) layer as shown in Fig. 2a along with the clocking circuit. As the clock  $V_{clk}$  toggles between 0 and  $V_{DD}$ , the ME cell gets charged or discharged through the top and bottom nFETS respectively. The voltage  $V_{ME}$  applied across the FE creates an isotropic strain  $\varepsilon = \varepsilon_0 + d_{31}V/t_{FE}$ , where  $\varepsilon_0$  is the residual strain and  $t_{FE}$ ,  $d_{31}$  are the thickness and piezoelectric coefficient of FE respectively. The strain gets coupled to the upper FM, creating a strain-mediated

<sup>&</sup>lt;sup>1</sup>School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA sdutta38@gatech.edu

<sup>&</sup>lt;sup>2</sup>Components Research, Intel Corporation, Hillsboro, OR 97124 USA



Fig. 2. Schematic of the various building blocks in a SWD logic circuit- (a) single stage SWD consisting of ME cells and PMA spin wave bus (SWB), (b,c) CS converters using MTJ based STT and SHE, respectively, (d-f) SC converters using MTJ readout, ME readhead and ISHE/IREE respectively.

out-of-plane magnetic anisotropy  $K = \frac{3}{2}\lambda Y\varepsilon$ , where  $\lambda, Y$  are the magnetostrictive coefficient and Young's modulus of FM. Above a critical strain, the magnetic easy-axis rotates outof-plane causing an out-of-plane switching of magnetization and creating spin waves, with the information encoded into the phase of the waves. Once out-of-plane, the magnetization continues to be held in the meta-stable state via application of voltage until the incoming spin waves from the previous stage arrive. Upon arrival,  $V_{clk}$  is turned off causing a phase dependent deterministic switching of the ME cell. Details about the working principle has been described in [3], [6].

#### B. Charge-to-Spin converters

The first stage of the SWD logic circuit involves CS converters that translate information from charge to spin domain by switching the nanomagnet from a metastable to one of the low energy magnetization states via a voltage or current dependent external torque like transfer of angular momentum to the nanomagnet using spin currents. We investigate two potential candidates - (a) a magnetic tunnel junction (MTJ) based structure relying on spin-transfer torque (STT) [3] and (b) the spin Hall effect (SHE) and/or Rashba effect.

Fig. 2b shows an STT based CS converter consisting of two MTJ stacks built on top of the ME cell with the fixed ferromagnetic layers pinned in the opposite directions  $(\pm m_x)$ . When  $V_{clk}$  of ME clock goes low, the write enable signal  $V_{WE}$ 

in the writing circuit is turned on. Depending on whether the data signal  $V_{Data}$  is high (bit "1") or low (bit "0"), either  $N_1$  and  $N_4$  or  $N_2$  and  $N_3$  are turned on respectively, injecting currents  $I_1$  or  $I_2$  through the MTJ which gets spin polarized through the pinned layer  $I_s = I_1/\eta$  and switches the free FM layer from +z to +/-x.

A more promising alternative is using the SHE. Fig. 2c shows the proposed structure consisting of a SHE material exhibiting high spin-orbit coupling (SOC) and an insulating FM built on top of the ME cell. The insulating FM absorbs spin torque while preventing current shunting to bottom metallic layers. The effect arising from SOC manifests as the creation of a transverse spin polarized current  $I_s$  on the application of a charge current  $I_c$  flowing through the material. During the time  $V_{WE}$  is high, depending on V<sub>Data</sub>, I<sub>c</sub> flows through the GSHE material along +/- y direction injecting -/+ x polarized spin current  $I_s =$  $\frac{w}{t}\theta_{SH} \left[1-\operatorname{sech}\left(\frac{t}{\lambda_s}\right)\right] (2G^{\uparrow\downarrow} tanh(\frac{t}{2\lambda_s})) / (2G^{\uparrow\downarrow} coth(\frac{t}{\lambda_s})+\frac{\sigma}{\lambda_s}) I_c$ [7], [8] along -z direction, where w is the width of the FM,  $\theta_{SH}$  is the internal spin Hall angle(SHA), t,  $\lambda_s$  and  $\sigma$  are the thickness, spin diffusion length and conductivity of SHE metal respectively and  $G^{\uparrow\downarrow}$  is the interfacial spin mixing conductance.

### C. Spin-to-Charge converters

Identifying an efficient spin to charge transduction mechanism is crucial for beyond-CMOS spintronics devices. Here we discuss some of the feasible options for spin-to-charge (SC) converters - (a) high tunnel magnetoresistance (TMR) MTJ stack, (b) magnetoelectric (ME) read-head [9] and (c) inverse spin Hall effect (ISHE) and interface Rashba-Edelstein effect (IREE) [10], [11].

We consider an MTJ stack displaying high TMR built on top of the ME cell along with the CMOS read-out circuity as shown in Fig. 2d. The same clock  $V_{clk}$  used for clocking the ME cell is applied to the gate of the pMOS. The read-out operation is performed when  $V_{clk}$  is low and the magnetization is in one of the in-plane states  $(+/-m_x)$ storing either bit "1" or "0". As the sense current of few  $\mu A$ flows through the MTJ with resistance either  $R_P$  (parallel configuration) or  $R_{AP}$  (antiparallel configuration) and a fixed resistance r from a matched MTJ, the nodal voltage  $V_N$ swings between a high and low value. The proposed scheme provides several advantages: (a) In contrast to an STT-RAM that uses the same MTJ for read/write, we use the MTJ for read-out only. Hence, we can have a thicker oxide that increases the TMR and the output voltage can directly drive a CMOS interfacing circuit without any bulky sense amplifier (SA). (b) Since we use a matched MTJ for series resistance, the circuit is reliable even if there is variation in oxide thickness.

An alternative is to use a ME read-head shown in Fig. 2e comprising of a ferroelectric layer sandwiched between two pinned ferromagnetic/antiferromagnetic layers that provide a dc exchange bias magnetic field. As the magnetization in the lower FM layer switches, the dipolar magnetic field H induces an output voltage  $V = \alpha H t$ , where  $\alpha$  is the magnetoelectric coefficient of this material stack and t is the stack thickness. The small induced voltage would require an amplifier to interface with CMOS circuit.

Fig. 2f shows the schematic for SC converter using ISHE/IREE. The electrical current injected through the FM gives rise to a pure spin polarized current in the channel exhibiting high SOC due to spin accumulation at the interface. The injected spin current gives rise to a charge current  $I_c = \left[\theta_{SH}\lambda_s tanh\left(\frac{t}{2\lambda_s}\right) + \lambda_{IREE}\right]I_s/w$  where  $\lambda_{IREE}$  is the IREE length via bulk ISHE and/or IREE. Depending on the magnetization of FM,  $I_c$  either charges or discharges the capacitor. ISHE/IREE produces a spin to charge conversion efficiency of 0.1 with known materials and optimized widths, producing a net output voltage  $V_N$  of around 0.1 V [12] and hence would require amplification to interface with CMOS circuit.

### IV. PERFORMANCE EVALUATION

We perform micromagnetic simulations in OOMMF [13] for a single stage SWD. We consider an  $80nm \times 40nm \times 12nm$  Co<sub>60</sub>Fe<sub>40</sub>/(001) PMN-PT (30nm thick) as the ME cell heterostructure ( $M_S = 800$  kA/m, A = 20 pJ/m,  $\alpha = 0.027$ ,  $\lambda = 200$  ppm, Y = 200 GPa,  $d_{31} = -1000$  pm/V) and a 100nm long Co/Ni multilayer ( $M_S = 790$  kA/m, A = 16

pJ/m,  $H_K = 16.78$  kA/m,  $\alpha = 0.01$ ) as the perpendicular magentic anisotropy (PMA) spin wave bus. The voltage induced strain-mediated out-of-plane anisotropy required for magnetization switching yields a low operating voltage  $V_{ME}$  of 0.1V and an intrinsic energy dissipation  $(\frac{1}{2}\frac{\epsilon_0 \varepsilon_r A_{ME}}{t_{FE}}V_{ME}^2)$  of 4.5aJ keeping the switching delay to a minimum of 0.4-0.6ns and a spin wave propagation time of 0.2ns. Following the clocking design in [3], the single stage delay is 2.4ns. We perform HSPICE simulation for the clocking circuit shown in Fig. 2a with the ME cell represented as a capacitance of  $0.15 fF/\mu m$  and obtain an energy dissipation of 0.4fJ which is about 100x larger than the intrinsic energy dissipation of ME cell.

For CS converter, we use SHE of Pt ( $\theta_{SH} = 0.1$ ,  $\sigma = 3.2 \times 10^6 \Omega^{-1} m^{-1}$ ,  $G^{\uparrow\downarrow} = 0.57 \times 10^{15} \Omega^{-1} m^{-2}$ ,  $\lambda_s = 1.4$  nm and t = 3.4 nm) and obtain a delay of 1.3ns from micromagnetic simulation. The energy dissipation from HSPICE simulation for the writing circuit with Pt represented as a resistance of 45.58 $\Omega$  yeild 1fJ/write which is 20x larger than the intrinsic energy dissipation of 50.6aJ for the GSHE obtained from micromagnetic simulation. We use a  $V_{WE}$  of 40mV which can be further lowered at the cost of the area of the driving transistors.

Similar HSPICE simulations for the SC converter using MTJ with high TMR of 277% for  $Co_{40}Fe_{40}B_{20}/MgO(1.5 \text{ nm})/CoFe$  stack and resistance-area product  $(RA_P)$  of 1060  $\Omega\mu m^2$  [14] yield an energy dissipation of 2.2-2.3fJ considering inverter and D-Latch as interfacing CMOS circuit respectively, with an operating voltage of 0.4V. This is nearly 10x higher than the intrinsic energy dissipation of 240aJ in the read-out MTJ stack.



Fig. 3. Energy overhead imposed by transducers.

For estimating the energy overhead for the transducers, we consider a logic block of size N (number of ME cells) and a 3-phase clocking scheme as shown in Fig. 1. From Elmore's delay calculation including interconnect resistance (20  $\Omega/\mu m$ ) and capacitance, we find that each clock can drive approximately upto 250 ME cells. Fig. 3 shows the % overhead in energy considering  $N_{i/p}$  input,  $N_{o/p}$  output and  $N_{i/p} + N_{o/p}$  input-output transducers. It is seen that the %

overhead goes down as the size of the logic block (number of ME cells) increases. However, even for a size of  $2x10^4$ and 32 bit input/output (total 64), the energy overhead still remains at 25% and becomes even worse for 64 or 128 bits. For the delay estimation, if a program has a large number of instructions that need to be executed without any data dependencies, it can take the advantage of the pipeline structure of the SWD block by fetching input data at each clock cycle. The overall execution time per instruction is application dependent. At the optimal situation, the delay is equal to the clock cycle period of 2.4ns, which is applicable to all types of instructions.



Fig. 4. Area overhead imposed by transducers.

The area of a CMOS inverter is calculated following [1] as  $160F^2$ , where the minimum feature size F = 7nm. The area for nFETS is calculated as  $12F^2xN_fx1.5$ , assuming 50% area overhead with fin pitch of 3F, contact pitch of 4F and  $N_f$  = number of fins. Fig. 4 shows the % overhead in area considering  $N_{i/p}$  input,  $N_{o/p}$  output and  $N_{i/p} + N_{o/p}$  inputoutput transducers. The area-overhead follows a similar trend as energy with less detrimental effect compared to energy.



Fig. 5. Comparison between SWD and 7nm CMOS in terms of area (A), energy (E) and delay (D).

Finally, we evaluate the performance of SWD in terms of the area, energy and delay and compare it with the CMOS implementation at the circuit level as shown in Fig. 5. A variety of circuits are synthesized using the Synopsys Design Compiler. The cell library is adopted from the 7nm FinFET technology. SWD-based circuits are evaluated based on MIG optimization [5]. On an average SWD shows a 3x improvement in terms of area compared to CMOS and upto 10x improvement in energy-delay product (EDP) for large complex circuits with low overhead. For smaller circuits like ripple carry adders (RCA), the energy overhead for transducers overshadows any gain achieved in terms of area.

## V. CONCLUSIONS

We have for the first time investigated the impact of transducers on the performance of SWD logic circuitry in a comprehensive fashion. We propose and compare feasible mechanisms to efficiently translate information back and forth between charge and spin domain. Our study shows that even with the best possible options, the transducers act as a bottleneck, imposing nearly 20-40% energy overhead. However, SWD can still outperform CMOS at the circuit level for large complex circuits with low overhead.

#### REFERENCES

- D. E. Nikonov and I. A. Young, "Overview of beyond-cmos devices and a uniform methodology for their benchmarking," *Proceedings of the IEEE*, vol. 101, no. 12, pp. 2498–2533, 2013.
- [2] A. Khitun and K. L. Wang, "Non-volatile magnonic logic circuits engineering," *Journal of Applied Physics*, vol. 110, no. 3, p. 034306, 2011.
- [3] S. Dutta, S.-C. Chang, N. Kani, D. E. Nikonov, S. Manipatruni, I. A. Young, and A. Naeemi, "Non-volatile clocked spin wave interconnect for beyond-cmos nanomagnet pipelines," *Scientific Reports*, vol. 5, 2015.
- [4] O. Zografos, B. Sorée, A. Vaysset, S. Cosemans, L. Amarù, P.-E. Gaillardon, G. De Micheli, R. Lauwereins, S. Sayan, P. Raghavan, et al., "Design and benchmarking of hybrid cmos-spin wave device circuits compared to 10nm cmos," in *Proceedings of the 15th International IEEE Conference on Nanotechnology (NANO)*, no. EPFL-CONF-211004, 2015.
- [5] L. Amarú, P.-E. Gaillardon, and G. De Micheli, "Boolean logic optimization in majority-inverter graphs," in *Design Automation Conference (DAC)*, 2015 52nd ACM/EDAC/IEEE. IEEE, 2015, pp. 1–6.
- [6] S. Dutta, D. E. Nikonov, S. Manipatruni, I. A. Young, and A. Naeemi, "Phase-dependent deterministic switching of magnetoelectric spin wave detector in the presence of thermal noise via compensation of demagnetization," *Applied Physics Letters*, vol. 107, no. 19, p. 192404, 2015.
- [7] C.-F. Pai, Y. Ou, L. H. Vilela-Leão, D. Ralph, and R. Buhrman, "Dependence of the efficiency of spin hall torque on the transparency of pt/ferromagnetic layer interfaces," *Physical Review B*, vol. 92, no. 6, p. 064426, 2015.
- [8] W. Zhang, W. Han, X. Jiang, S.-H. Yang, and S. S. Parkin, "Role of transparency of platinum-ferromagnet interfaces in determining the intrinsic magnitude of the spin hall effect," *Nature Physics*, vol. 11, no. 6, pp. 496–502, 2015.
- [9] Y. Zhang, Z. Li, C. Deng, J. Ma, Y. Lin, and C.-W. Nan, "Demonstration of magnetoelectric read head of multiferroic heterostructures," *Applied Physics Letters*, vol. 92, no. 15, p. 152510, 2008.
- [10] E. Saitoh, M. Ueda, H. Miyajima, and G. Tatara, "Conversion of spin current into charge current at room temperature: Inverse spinhall effect," *Applied Physics Letters*, vol. 88, no. 18, p. 182509, 2006.
- [11] J. R. Sánchez, L. Vila, G. Desfonds, S. Gambarelli, J. Attané, J. De Teresa, C. Magén, and A. Fert, "Spin-to-charge conversion using rashba coupling at the interface between non-magnetic materials," *Nature communications*, vol. 4, 2013.
- [12] S. Manipatruni, D. E. Nikonov, and I. A. Young, "Spin-orbit logic with magnetoelectric nodes: A scalable charge mediated nonvolatile spintronic logic," arXiv preprint arXiv:1512.05428, 2015.
- [13] M. J. Donahue and D. G. Porter, *OOMMF User's guide*. US Department of Commerce, Technology Administration, National Institute of Standards and Technology, 1999.
- [14] S. Ikeda, J. Hayakawa, Y. M. Lee, F. Matsukura, and H. Ohno, "Dependence of tunnel magnetoresistance on ferromagnetic electrode materials in mgo-barrier magnetic tunnel junctions," *Journal of Magnetism and Magnetic Materials*, vol. 310, no. 2, pp. 1937–1939, 2007.